Intermediate NumPy

Unidata Python Workshop


Overview:

  • Teaching: 15 minutes
  • Exercises: 20 minutes

Questions

  1. How do we work with the multiple dimensions in a NumPy Array?
  2. How can we extract irregular subsets of data?
  3. How can we sort an array?

Objectives

  1. Using axes to slice arrays
  2. Index arrays using true and false
  3. Index arrays using arrays of indices

1. Using axes to slice arrays

The solution to the last exercise in the Numpy Basics notebook introduces an important concept when working with NumPy: the axis. This indicates the particular dimension along which a function should operate (provided the function does something taking multiple values and converts to a single value).

Let's look at a concrete example with sum:


In [ ]:
# Convention for import to get shortened namespace
import numpy as np

In [ ]:
# Create an array for testing
a = np.arange(12).reshape(3, 4)
a

In [ ]:
# This calculates the total of all values in the array
np.sum(a)

In [ ]:
# Keep this in mind:
a.shape

In [ ]:
# Instead, take the sum across the rows:
np.sum(a, axis=0)

In [ ]:
# Or do the same and take the some across columns:
np.sum(a, axis=1)
EXERCISE:
  • Finish the code below to calculate advection. The trick is to figure out how to do the summation.

In [ ]:
# Synthetic data
temp = np.random.randn(100, 50)
u = np.random.randn(100, 50)
v = np.random.randn(100, 50)

# Calculate the gradient components
gradx, grady = np.gradient(temp)

# Turn into an array of vectors:
# axis 0 is x position
# axis 1 is y position
# axis 2 is the vector components
grad_vec = np.dstack([gradx, grady])
print(grad_vec.shape)

# Turn wind components into vector
wind_vec = np.dstack([u, v])

# Calculate advection, the dot product of wind and the negative of gradient
# DON'T USE NUMPY.DOT (doesn't work). Multiply and add.

In [ ]:
# %load solutions/advection.py

Top


2. Indexing Arrays with Boolean Values

Numpy can easily create arrays of boolean values and use those to select certain values to extract from an array


In [ ]:
# Create some synthetic data representing temperature and wind speed data
np.random.seed(19990503)  # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
        50 + 2 * np.random.randn(100))
spd = (np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) +
              10 + 5 * np.random.randn(100)))

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(temp, 'tab:red')
plt.plot(spd, 'tab:blue');

By doing a comparision between a NumPy array and a value, we get an array of values representing the results of the comparison between each element and the value


In [ ]:
temp > 45

We can take the resulting array and use this to index into the NumPy array and retrieve the values where the result was true


In [ ]:
print(temp[temp > 45])

So long as the size of the boolean array matches the data, the boolean array can come from anywhere


In [ ]:
print(temp[spd > 10])

In [ ]:
# Make a copy so we don't modify the original data
temp2 = temp.copy()

# Replace all places where spd is <10 with NaN (not a number) so matplotlib skips it
temp2[spd < 10] = np.nan
plt.plot(temp2, 'tab:red')

Can also combine multiple boolean arrays using the syntax for bitwise operations. MUST HAVE PARENTHESES due to operator precedence.


In [ ]:
print(temp[(temp < 45) & (spd > 10)])
EXERCISE:
  • Heat index is only defined for temperatures >= 80F and relative humidity values >= 40%. Using the data generated below, use boolean indexing to extract the data where heat index has a valid value.

In [ ]:
# Here's the "data"
np.random.seed(19990503)  # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
        80 + 2 * np.random.randn(100))
rh = (np.abs(20 * np.cos(np.linspace(0, 4 * np.pi, 100)) +
              50 + 5 * np.random.randn(100)))


# Create a mask for the two conditions described above
# good_heat_index = 



# Use this mask to grab the temperature and relative humidity values that together
# will give good heat index values
# temp[] ?


# BONUS POINTS: Plot only the data where heat index is defined by
# inverting the mask (using `~mask`) and setting invalid values to np.nan

In [ ]:
# %load solutions/heat_index.py

Top


3. Indexing using arrays of indices

You can also use a list or array of indices to extract particular values--this is a natural extension of the regular indexing. For instance, just as we can select the first element:


In [ ]:
print(temp[0])

We can also extract the first, fifth, and tenth elements:


In [ ]:
print(temp[[0, 4, 9]])

One of the ways this comes into play is trying to sort numpy arrays using argsort. This function returns the indices of the array that give the items in sorted order. So for our temp "data":


In [ ]:
inds = np.argsort(temp)
print(inds)

We can use this array of indices to pass into temp to get it in sorted order:


In [ ]:
print(temp[inds])

Or we can slice inds to only give the 10 highest temperatures:


In [ ]:
ten_highest = inds[-10:]
print(temp[ten_highest])

There are other numpy arg functions that return indices for operating:


In [ ]:
np.*arg*?

Top